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IN THE CLAIMS 

1 . (Original) A method of processing audio-based data associated with a particular language, 
the method comprising the steps of: 

storing the audio-based data; 

generating a textual representation of the audio-based data, the textual representation being 
in the form of one or more semantic units corresponding to the audio-based data; and 

indexing the one or more semantic units and storing the one or more indexed semantic units 
for use in searching the stored audio-based data in response to a user query. 

2. (Original) The method of claim 1, wherein the semantic unit is a syllable. 

3. (Original) The method of claim 2, wherein the syllable is a phonetically based syllable. 

4. (Original) The method of claim 1 , wherein the semantic unit is a morpheme. 

5. (Original) The method of claim 1, wherein the generating step comprises decoding the 
audio-based data in accordance with a speech recognition system. 

6. (Original) The method of claim 5, wherein the speech recognition system employs a 
semantic unit based language model. 

7. (Original) The method of claim 1 , wherein the indexing step comprises time stamping the 
one or more semantic units. 

8. (Original) The method of claim 1, wherein the searching step comprises: 
processing the user query to generate one or more semantic units representing the 

information that the user seeks to retrieve; 

searching the one or more indexed semantic units to find a substantial match with the one 
or more semantic units associated with the user query; and 
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retrieving one or more segments of the audio-based data using the one or more indexed 
semantic units that match the one or more semantic units associated with the user query. 

9. (Original) The method of claim 8, wherein the searching step further comprises presenting 
the retrieved data to the user. 

10. (Original) The method of claim 1, wherein the particular language is an Asian based 
language. 

11. (Original) The method of claim 10, wherein the particular language is Chinese. 

12. (Original) The method of claim 11, wherein the semantic unit is a Chinese character. 

13. (Original) The method of claim 1, wherein the particular language is a Slavic based 
language. 

14. (Original) The method of claim 1, wherein the one or more semantic units are indexed 
according to speaker attributes. 

15. (Original) The method of claim 1, wherein the one or more semantic units are indexed 
according to at least one of when the audio based data was produced and where the audio based data 
was produced. 

1 6. (Original) The method of claim 1 , further comprising the step of storing video based data 
associated with the audio based data for use in searching the stored audio based data and the video 
based data in response to a user query. 



17. (Original) The method of claim 16, wherein the searching step includes a hierarchical 
search routine. 
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18. (Original) The method of claim 1, wherein the generating step comprises 
stenographically transcribing the audio-based data to generate the textual representation. 

19. (Original) Apparatus for processing audio-based data associated with a particular 
language, the apparatus comprising: 

at least one processor operative to: (i)store the audio-based data; (ii) generate a textual 
representation of the audio-based data, the textual representation being in the form of one or more 
semantic units corresponding to the audio-based data; and (iii) index the one or more semantic units 
and store the one or more indexed semantic units for use in searching the stored audio-based data 
in response to a user query. 

20. (Original) An audio-based data indexing and retrieval system for processing audio-based 
data associated with a particular language, the system comprising: 

memory for storing the audio-based data; 

a semantic unit based speech recognition system for generating a textual representation of 
the audio-based data, the textual representation being in the form of one or more semantic units 
corresponding to the audio-based data; 

an indexing and storage module, operatively coupled to the semantic unit based speech 
recognition system and the memory, for indexing the one or more semantic units and storing the one 
or more indexed semantic units; and 

a search engine, operatively coupled to the indexing and storage module and the memory, 
for searching the one or more indexed semantic units for a match with one or more semantic units 
associated with a user query, and for retrieving the stored audio based data based on the one or more 
indexed semantic units. 

2 1 . (New) The method of claim 5, wherein the speech recognition system employs a syllable 
language model. 
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22. (New) The method of claim 21, wherein production of the syllable language model 
comprises the steps of: 

transcribing audio data to generate syllables; 

deriving conditional probabilities of distribution based on the generated syllables; and 
using syllable counts and the conditional probabilities to construct the syllable language 

model. 

23. (New) The method of claim 1, wherein the user query comprises a word. 

24. (New) The method of claim 23, wherein the searching step further comprises 
transforming the word into a sequence of syllables using a text-to-phonetic syllable map. 

25. (New) The method of claim 3, wherein a phonetically-based syllable comprises a 
toneme. 

26. (New) The method of claim 3, wherein two or more different pronunciations are 
associated with a phonetically-based syllable. 

27. (New) The method of claim 1, wherein the generating step comprises producing the 
textual representation via stenography. 

28. (New) The method of claim 1 , wherein the searching step comprises use of a hierarchical 

index. 

29. (New) The method of claim 1 , wherein the searching step comprises use of an automatic 
boundary marking system. 



